Classificaiton

Advance Analytics with R (UG 21-24)

Ayush Patel

Before we start

Please load the following packages

library(tidyverse)
library(MASS)
library(ISLR)
library(ISLR2)
library(nnet)### get this if you don't
library(e1071) ## get this if you don't



Access lecture slide from bit.ly/aar-ug

Warrior's armor(gusoku)
Source: Armor (Gusoku)

Hello

I am Ayush.

I am a researcher working at the intersection of data, law, development and economics.

I teach Data Science using R at Gokhale Institute of Politics and Economics

I am a RStudio (Posit) certified tidyverse Instructor.

I am a Researcher at Oxford Poverty and Human development Initiative (OPHI), at the University of Oxford.

Reach me

ayush.ap58@gmail.com

ayush.patel@gipe.ac.in

Learning Objective

Dip our toes into classification techniques. How to apply and assess these methods.

References for this lecture:

  • Chapter 4, ISLR (reference)
  • Chapters 9, Intro to Modern Statistics (Reading for intuitive understanding)
  • Chapter 10.2 Modern Data Science with R

What is Classification?

  • Predict qualitative response
  • Approaches of predicting qualitative response, a process called classification.
  • A method or technique can be referred to as a classifier.
  • We will look into: logistic regression, linear discriminant analysis, quadratic discriminant analysis, naive Bayes and K-nearest neighbours

What actually happens

….often the methods used for classification first predict the probability that the observation belongs to each of the categories of a qualitative variable, as the basis for making the classification. In this sense they also behave like regression methods.

Why not use linear regression??

  • Nominal categorical variables have no rank. How to provide quantitative values?
  • Distance between Ordinal variable values are not easy to assign.
  • Could do something when the response is nominal with only two levels.
  • No guarantee that our estimates will be between [0,1]. Makes interpreting probabilities difficult.

Default data

default student balance income
No No 729.5265 44361.625
No Yes 817.1804 12106.135
No No 1073.5492 31767.139
No No 529.2506 35704.494
No No 785.6559 38463.496
No Yes 919.5885 7491.559
No No 825.5133 24905.227
No Yes 808.6675 17600.451
No No 1161.0579 37468.529
No No 0.0000 29275.268
No Yes 0.0000 21871.073
No Yes 1220.5838 13268.562
No No 237.0451 28251.695
No No 606.7423 44994.556
No No 1112.9684 23810.174
No No 286.2326 45042.413
No No 0.0000 50265.312
No Yes 527.5402 17636.540
No No 485.9369 61566.106
No No 1095.0727 26464.631

Logistic Regression

  • Logistic regressions are well suited for qualitative binary responses.
  • default variable from Default is our response(\(Y\)).
  • It has two levels Yes or No.
  • We model the probability that \(Y\) belongs to one a particular category.
  • \(Pr(default = Yes|balance)\) - logistic model estimates this. Is referred to as \(p(balance)\) as well.
  • Mainly, depending on risk aversion behaviour, \(a\) is chosen. \(p(balance) > a\), where \(0<=a<=1\).

But what if ?

I ran this: \(p(balance) = \beta_0 + \beta_1X\)

## make a dummy for default

Default|>
  mutate(
    default_dumm = ifelse(
      default == "Yes",
      1,0
    )
  )-> def_dum

## regress dummy over balance and plot 

lm(default_dumm ~ balance, 
   data = def_dum)|>
  broom::augment()|>
  ggplot(aes(balance,default_dumm))+
  geom_point(alpha= 0.6)+
  geom_line(aes(balance, .fitted),
            colour = "red")+
  labs(
    title = "Linear regression fit to qualitative response",
    subtitle = "Yes =1, No = 0",
    y = "prob default status"
  )+
  theme_minimal() -> plot_linear

## Run the logistic regression

glm(
  default_dumm ~ balance,
  data = def_dum,
  family = binomial
)|>
  broom::augment(type.predict = "response")|>
  ggplot(aes(balance,default_dumm))+
  geom_point(alpha= 0.6)+
  geom_line(aes(balance, .fitted),
            colour = "red")+
  labs(
    title = "Logistic regression fit to qualitative response",
    subtitle = "Yes =1, No = 0",
    y = "prob default status"
  )+
  theme_minimal() -> logistic_plot

Logistic Model

We saw that some fitted values in the linear model were negative.

We need a function that will return values between [0,1].

\[p(X) = \frac{e^{(\beta_0 + \beta_1X)}}{1+e^{\beta_0 + \beta_1X}}\]

This is the logistic function, modeled by the maximum likelihood method.

odds:

\[\frac{p(X)}{1-p(X)}\] **log odds or logit:

\[log(\frac{p(X)}{1-p(X)}) = \beta_0 + \beta_1X\]

Exercise - concept

if the following are the results of the model \(logit(p(default)) = \beta_0 + \beta_1Balance\):

term estimate std.error statistic p.value
(Intercept) -10.651330614 0.3611573721 -29.49221 3.623124e-191
balance 0.005498917 0.0002203702 24.95309 1.976602e-137

What is the probability of default with balance $5000??

Multiple logistic Regression

\[p(X) = \frac{e^{(\beta_0 + \beta_1X_1 + \beta_2X_2+...+\beta_nX_n)}}{1+e^{\beta_0 + \beta_1X_1 + \beta_2X_2+...+\beta_nX_n}}\]

term estimate std.error statistic p.value
(Intercept) -1.086905e+01 4.922555e-01 -22.080088 4.911280e-108
income 3.033450e-06 8.202615e-06 0.369815 7.115203e-01
balance 5.736505e-03 2.318945e-04 24.737563 4.219578e-135
studentYes -6.467758e-01 2.362525e-01 -2.737646 6.188063e-03
term estimate std.error statistic p.value
(Intercept) -3.5041278 0.07071301 -49.554219 0.0000000000
studentYes 0.4048871 0.11501883 3.520181 0.0004312529

How to know if its good?

There is no consesus in statistics community over a single measure that can describe a goodness of fit for logistic regression.

glm(
  default_dumm ~ income + balance + student,
  data = def_dum,
  family = binomial
) -> mod_logit

DescTools::PseudoR2(mod_logit,
                    which = c("McFadden", "CoxSnell",
                              "Nagelkerke", "Tjur"))
  McFadden   CoxSnell Nagelkerke       Tjur 
 0.4619194  0.1262059  0.4982860  0.3355203 
AIC(mod_logit) # be careful with this
[1] 1579.545

Exercise

Use the Credit data in {ISLR}.

  • Create three new variables :
    • one: mark_south (1 if Region is South, else 0)
    • Two: mark_west (1 if Region is West, else 0)
    • Three: mark_east (1 if Region is East, else 0)
  • Create three binomial logistic models, one for each newly created variable.
  • Get \(PseudoR^2\) for each model.

What you just did is called Stratified binary model.

  • n models are created to understand probabilities related to n levels of the categorical response variable.
  • n models are non-comparable.
  • relative probabilities amongst n levels of the response are not known.

Relative risk or Baseline Approach

to Multinomial Logistic Regression

\[Pr(Y=k|X=x) = \frac{e^{\beta_{k0}+\beta_{k1}x_1+...+\beta_{kp}xp}}{1+\sum_{l=1}^{K-1}e^{\beta_{l0}+\beta_{l1}x_1+...+\beta_{lp}x_p}}\]

for k = 1,…K-1, and

\[Pr(Y=K|X=x) = \frac{1}{1+\sum_{l=1}^{K-1}e^{\beta_{l0}+\beta_{l1}x_1+...+\beta_{lp}x_p}}\]

Multinomial Logistic

\[log(\frac{Pr(Y=k|X=x)}{Pr(Y=K|X=x)}) = \beta_{k0}+\beta_{k1}x_1+...+\beta_{kp}xp\]

  • Which class is treated as reference or baseline is unimportant.

  • How to interpret this?

Data

Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

Set Baseline/reference

palmerpenguins::penguins|>
  mutate(
    species = stats::relevel(species,
                             ref = "Gentoo")
  ) -> peng_ref

levels(peng_ref$species)
[1] "Gentoo"    "Adelie"    "Chinstrap"

Describe model

multi_log <- nnet::multinom(
  formula = species ~ body_mass_g + bill_length_mm + bill_depth_mm + flipper_length_mm + sex + island, 
  data = peng_ref
)
# weights:  27 (16 variable)
initial  value 365.837892 
iter  10 value 21.914358
iter  20 value 1.629266
iter  30 value 0.026372
final  value 0.000049 
converged

Peek into Summary - notice anything?

Call:
nnet::multinom(formula = species ~ body_mass_g + bill_length_mm + 
    bill_depth_mm + flipper_length_mm + sex + island, data = peng_ref)

Coefficients:
          (Intercept) body_mass_g bill_length_mm bill_depth_mm
Adelie       502.6573 -0.08755830     -20.075027      34.82987
Chinstrap   -434.3867 -0.02106537       6.332771     -16.48865
          flipper_length_mm   sexmale islandDream islandTorgersen
Adelie            0.5054518  33.23469    62.03886        144.9809
Chinstrap         1.7645190 -55.22699   335.85058         63.1425

Std. Errors:
          (Intercept) body_mass_g bill_length_mm bill_depth_mm
Adelie      0.5314853    2.351402       29.93540      5.286822
Chinstrap   0.5310960    4.080649       29.91681      5.278463
          flipper_length_mm   sexmale islandDream islandTorgersen
Adelie             49.88305 0.2294146    0.531096    4.701009e-47
Chinstrap          49.81079 0.2290253    0.531096   4.261135e-130

Residual Deviance: 9.874339e-05 
AIC: 32.0001 

Getting p-values

# calculate z-statistics of coefficients
z_stats <- summary(multi_log)$coefficients/
  summary(multi_log)$standard.errors

# convert to p-values
p_values <- (1 - pnorm(abs(z_stats)))*2


# display p-values in transposed data frame
data.frame(t(p_values))
                        Adelie   Chinstrap
(Intercept)       0.000000e+00 0.000000000
body_mass_g       9.702963e-01 0.995881131
bill_length_mm    5.024680e-01 0.832357200
bill_depth_mm     4.456258e-11 0.001785562
flipper_length_mm 9.919154e-01 0.971741303
sexmale           0.000000e+00 0.000000000
islandDream       0.000000e+00 0.000000000
islandTorgersen   0.000000e+00 0.000000000

Fitted values

           Gentoo        Adelie     Chinstrap
1   1.565008e-135  1.000000e+00 1.009721e-242
2    3.833780e-97  1.000000e+00 1.450741e-166
3   3.913549e-122  1.000000e+00 1.006490e-181
5   3.854489e-165  1.000000e+00 2.652195e-247
6   2.628864e-168  1.000000e+00 9.671388e-281
7   5.558841e-114  1.000000e+00 3.782674e-190
8   3.576898e-116  1.000000e+00 2.880335e-227
13  3.717985e-108  1.000000e+00 3.470609e-172
14  5.313520e-178  1.000000e+00 2.906555e-297
15  4.397228e-190  1.000000e+00 9.358591e-320
16  4.638659e-132  1.000000e+00 3.570151e-212
17  1.323762e-143  1.000000e+00 1.383387e-216
18  3.906700e-111  1.000000e+00 6.765722e-218
19  2.342237e-174  1.000000e+00 3.742224e-262
20  1.803219e-103  1.000000e+00 6.877397e-206
21   3.440325e-75  1.000000e+00 1.084873e-188
22   2.942167e-90  1.000000e+00 4.087688e-228
23   1.885524e-93  1.000000e+00 8.693837e-211
24   1.304105e-64  1.000000e+00 3.625997e-196
25   2.258588e-50  1.000000e+00 2.714209e-176
26   1.050278e-93  1.000000e+00 4.472482e-212
27   5.070290e-66  1.000000e+00 1.977702e-192
28   4.657299e-56  1.000000e+00 1.773975e-147
29   6.353727e-88  1.000000e+00 1.524263e-202
30   1.459617e-55  1.000000e+00 2.364103e-190
31   1.084357e-69  1.000000e+00  9.173878e-17
32  1.227459e-100  1.000000e+00  5.420913e-94
33   1.265506e-86  1.000000e+00  2.282452e-34
34   8.494378e-82  1.000000e+00  4.163335e-66
35  3.892077e-102  1.000000e+00  1.530044e-47
36  5.034347e-123  1.000000e+00 7.437468e-121
37  3.676963e-116  1.000000e+00 5.542033e-110
38   2.071518e-62  1.000000e+00  3.696922e-16
39  2.421798e-124  1.000000e+00  2.037458e-93
40   6.809138e-66  1.000000e+00  1.601284e-61
41  3.425600e-120  1.000000e+00  7.624079e-81
42   1.606314e-77  1.000000e+00  4.276671e-50
43  6.806524e-135  1.000000e+00  5.590874e-97
44   1.276361e-49  1.000000e+00  3.090448e-26
45  1.481080e-105  1.000000e+00  2.763406e-53
46   2.564070e-66  1.000000e+00  2.715698e-55
47   3.442093e-99  1.000000e+00  7.482183e-81
49  2.187000e-113  1.000000e+00  2.595632e-71
50   2.061308e-96  1.000000e+00  2.896132e-90
51   2.979432e-49  1.000000e+00 3.169427e-145
52   1.697699e-47  1.000000e+00 1.852417e-180
53   3.668664e-95  1.000000e+00 1.072902e-201
54   3.787966e-52  1.000000e+00 1.067250e-172
55  8.398157e-123  1.000000e+00 2.068522e-229
56   4.242336e-55  1.000000e+00 1.503984e-174
57   1.477876e-49  1.000000e+00 3.319481e-146
58   9.797629e-62  1.000000e+00 3.358364e-184
59   2.927841e-83  1.000000e+00 9.109545e-178
60   1.502988e-94  1.000000e+00 3.440099e-226
61   3.045152e-84  1.000000e+00 8.887102e-183
62   4.785268e-68  1.000000e+00 5.169778e-209
63   4.444320e-52  1.000000e+00 3.202953e-150
64   1.419865e-38  1.000000e+00 2.020648e-158
65   2.360178e-92  1.000000e+00 2.038745e-188
66   5.421996e-35  1.000000e+00 4.069958e-151
67   5.476171e-70  1.000000e+00 3.160205e-158
68   2.078893e-49  1.000000e+00 3.188062e-179
69  7.954905e-146  1.000000e+00 1.710102e-209
70   1.077474e-99  1.000000e+00 7.562552e-198
71  3.865774e-182  1.000000e+00 1.261787e-274
72  4.918193e-122  1.000000e+00 6.676413e-220
73  6.012949e-105  1.000000e+00 1.034230e-162
74   1.910730e-68  1.000000e+00 4.867732e-150
75  3.284153e-138  1.000000e+00 2.276590e-215
76   2.618102e-84  1.000000e+00 9.775178e-174
77   9.229092e-81  1.000000e+00 2.732396e-137
78  1.219806e-157  1.000000e+00 3.841097e-274
79  5.636064e-116  1.000000e+00 4.127281e-182
80  5.403569e-109  1.000000e+00 2.345329e-202
81  2.595345e-160  1.000000e+00 5.445574e-234
82   6.249382e-53  1.000000e+00 5.461387e-139
83  5.962076e-143  1.000000e+00 2.474997e-229
84  1.620648e-166  1.000000e+00 1.214968e-284
85  1.460100e-104  1.000000e+00  1.627128e-56
86  5.435650e-115  1.000000e+00  2.320514e-97
87  4.252100e-136  1.000000e+00 7.652729e-132
88  5.233585e-114  1.000000e+00  1.076593e-75
89  3.364375e-108  1.000000e+00  1.960486e-98
90   5.170773e-96  1.000000e+00  8.844224e-54
91  2.399353e-116  1.000000e+00  1.564235e-67
92   2.369104e-57  1.000000e+00  5.984160e-23
93  1.586023e-119  1.000000e+00  1.344923e-80
94   1.484469e-60  1.000000e+00  3.282288e-46
95  1.300253e-107  1.000000e+00  1.283398e-61
96   9.985286e-73  1.000000e+00  1.402530e-42
97   3.686971e-96  1.000000e+00  1.308798e-55
98   1.683356e-66  1.000000e+00  1.620603e-44
99  1.008476e-129  1.000000e+00  6.726151e-87
100  7.608280e-50  1.000000e+00  1.154731e-20
101  3.825182e-85  1.000000e+00 1.162753e-192
102  2.022024e-43  1.000000e+00 3.536458e-174
103  1.322537e-55  1.000000e+00 4.846989e-143
104  1.577519e-86  1.000000e+00 1.054969e-231
105 4.338556e-101  1.000000e+00 1.474158e-197
106  1.261728e-78  1.000000e+00 6.837177e-209
107  9.369108e-44  1.000000e+00 3.189705e-131
108  2.378118e-96  1.000000e+00 3.188707e-237
109  5.296813e-63  1.000000e+00 6.022331e-159
110  6.780156e-06  9.999932e-01 1.699143e-128
111  1.872526e-34  1.000000e+00 9.761655e-120
112  5.672252e-10  1.000000e+00 2.800928e-138
113  2.520868e-61  1.000000e+00 6.490000e-149
114  3.439657e-41  1.000000e+00 1.510222e-165
115  1.613357e-80  1.000000e+00 8.393870e-198
116  4.589386e-26  1.000000e+00 2.167375e-139
117 1.333710e-133  1.000000e+00 7.219327e-193
118 1.875383e-181  1.000000e+00 6.421625e-293
119 5.416429e-142  1.000000e+00 1.382857e-212
120 1.686160e-134  1.000000e+00 1.870521e-225
121 7.970359e-148  1.000000e+00 3.536496e-218
122 1.292156e-177  1.000000e+00 3.222353e-281
123  4.198999e-96  1.000000e+00 3.384342e-165
124 2.604573e-112  1.000000e+00 8.559222e-197
125 1.837631e-140  1.000000e+00 4.157304e-204
126 1.947839e-121  1.000000e+00 3.828437e-215
127 2.478227e-127  1.000000e+00 1.775509e-191
128  1.024613e-90  1.000000e+00 9.595811e-183
129 1.396937e-126  1.000000e+00 1.546359e-184
130  3.280699e-78  1.000000e+00 1.061450e-146
131 2.298752e-132  1.000000e+00 1.046066e-200
132 3.065056e-121  1.000000e+00 1.841132e-206
133 3.030640e-114  1.000000e+00  2.000481e-72
134  8.112726e-87  1.000000e+00  2.224610e-71
135  7.842629e-91  1.000000e+00  6.644925e-43
136  3.409778e-60  1.000000e+00  2.490694e-29
137 1.683988e-121  1.000000e+00  2.224177e-74
138 1.033267e-106  1.000000e+00  5.770265e-90
139  2.701357e-84  1.000000e+00  8.079601e-33
140  8.450468e-66  1.000000e+00  1.488202e-42
141  3.152874e-67  9.999593e-01  4.068580e-05
142  1.617609e-75  1.000000e+00  2.721354e-42
143 7.409834e-126  1.000000e+00  3.397055e-75
144  8.994604e-63  1.000000e+00  7.923681e-28
145 5.783065e-103  1.000000e+00  8.678595e-44
146 4.605532e-105  1.000000e+00  4.108604e-90
147  4.342044e-80  1.000000e+00  1.572962e-65
148 1.885663e-113  1.000000e+00  3.916754e-78
149 5.687431e-113  1.000000e+00  2.382362e-66
150 2.105977e-104  1.000000e+00  3.060488e-83
151  4.036940e-91  1.000000e+00  6.654021e-48
152  1.912664e-70  1.000000e+00  3.972737e-38
153  1.000000e+00 1.772889e-109  1.371973e-36
154  1.000000e+00 1.293197e-123  1.826463e-68
155  1.000000e+00 7.520028e-117  3.422658e-36
156  1.000000e+00 6.893219e-143  8.765637e-70
157  1.000000e+00 8.385801e-122  6.314732e-71
158  1.000000e+00 1.507993e-110  7.335537e-39
159  1.000000e+00  1.320615e-93  2.768662e-51
160  1.000000e+00  2.262479e-93  3.099923e-74
161  1.000000e+00  1.120134e-78  2.435404e-46
162  1.000000e+00  1.043816e-91  2.769695e-77
163  1.000000e+00  1.266441e-61  1.520768e-53
164  1.000000e+00 2.726199e-115  3.866374e-79
165  1.000000e+00 9.945072e-102  6.813724e-41
166  1.000000e+00 8.140272e-145  4.315103e-75
167  1.000000e+00  1.696385e-74  1.841659e-45
168  1.000000e+00 3.811851e-135  1.988430e-77
169  1.000000e+00  4.187589e-56  1.407891e-47
170  1.000000e+00 4.529948e-158  3.567558e-75
171  1.000000e+00 1.564320e-102  6.698092e-50
172  1.000000e+00 7.030489e-119  2.242786e-66
173  1.000000e+00 3.026554e-158  8.663190e-63
174  1.000000e+00  3.137029e-99  3.705486e-50
175  1.000000e+00  4.648131e-89  2.375477e-42
176  1.000000e+00  1.702306e-77  1.311517e-80
177  1.000000e+00 3.163518e-101  3.495546e-46
178  1.000000e+00  3.054452e-88  1.327207e-76
180  1.000000e+00 1.724005e-125  3.039510e-76
181  1.000000e+00 3.604353e-115  2.263329e-40
182  1.000000e+00 3.118941e-135  1.353999e-67
183  1.000000e+00 7.598731e-100  9.617386e-71
184  1.000000e+00  1.264204e-73  3.452318e-56
185  1.000000e+00 6.903902e-103  9.568626e-57
186  1.000000e+00 4.939525e-210  2.816545e-50
187  1.000000e+00 3.582723e-134  7.603848e-39
188  1.000000e+00 1.878878e-100  8.760223e-78
189  1.000000e+00  4.506500e-88  2.221559e-52
190  1.000000e+00  5.736024e-45  2.434454e-95
191  1.000000e+00  4.502070e-80  3.721294e-46
192  1.000000e+00 7.073412e-113  2.117083e-81
193  1.000000e+00  5.134575e-52  8.682756e-47
194  1.000000e+00 9.195023e-126  3.007258e-71
195  1.000000e+00  1.487214e-87  2.630540e-41
196  1.000000e+00 9.689026e-107  2.712435e-62
197  1.000000e+00 4.463268e-130  5.531538e-69
198  1.000000e+00  5.495591e-91  1.539825e-47
199  1.000000e+00  1.805392e-82  2.836442e-41
200  1.000000e+00 1.028279e-123  2.594745e-65
201  1.000000e+00 7.028607e-120  1.460173e-44
202  1.000000e+00  2.064603e-77  6.387252e-86
203  1.000000e+00 3.070581e-112  2.416780e-46
204  1.000000e+00 8.443716e-131  7.699229e-61
205  1.000000e+00  5.034473e-79  8.759494e-48
206  1.000000e+00 1.247832e-118  2.619672e-56
207  1.000000e+00 1.046291e-108  3.826834e-43
208  1.000000e+00  4.093166e-71  1.731317e-77
209  1.000000e+00  6.860213e-72  2.136910e-48
210  1.000000e+00  1.269283e-79  8.616304e-73
211  1.000000e+00  2.750180e-63  1.025108e-55
212  1.000000e+00 7.667920e-138  1.981604e-63
213  1.000000e+00  1.118393e-82  1.219461e-42
214  1.000000e+00  1.993765e-98  3.966097e-72
215  1.000000e+00  6.105472e-91  1.731410e-39
216  1.000000e+00 4.647250e-168  4.056522e-51
217  1.000000e+00  1.385310e-97  2.832574e-40
218  1.000000e+00 2.621620e-114  1.352348e-72
220  1.000000e+00 8.632477e-125  8.344145e-71
221  1.000000e+00  2.591495e-77  7.813423e-46
222  1.000000e+00 3.248586e-145  3.191987e-61
223  1.000000e+00 1.311400e-104  1.558088e-43
224  1.000000e+00  3.566976e-78  7.591760e-74
225  1.000000e+00  1.138779e-97  8.241421e-70
226  1.000000e+00 4.596155e-114  9.416621e-49
227  1.000000e+00  2.254524e-91  1.187537e-46
228  1.000000e+00 9.484012e-120  4.411945e-71
229  1.000000e+00 8.464504e-111  2.395072e-42
230  1.000000e+00 8.286447e-147  7.570869e-76
231  1.000000e+00 3.488978e-101  1.392094e-42
232  1.000000e+00  2.690618e-91  4.928889e-90
233  1.000000e+00 1.674415e-120  5.032080e-38
234  1.000000e+00 1.810814e-148  3.469254e-61
235  1.000000e+00 5.692476e-108  2.484898e-44
236  1.000000e+00 1.130152e-117  5.371048e-67
237  1.000000e+00  3.160061e-99  1.046213e-45
238  1.000000e+00 4.235316e-112  4.820883e-74
239  1.000000e+00  4.723465e-70  3.696940e-48
240  1.000000e+00 3.876329e-154  2.180304e-55
241  1.000000e+00 1.269662e-123  3.931959e-41
242  1.000000e+00 1.245313e-125  2.493731e-66
243  1.000000e+00 4.957267e-110  2.215446e-44
244  1.000000e+00 1.002205e-119  6.243525e-67
245  1.000000e+00  7.196249e-94  4.540894e-49
246  1.000000e+00 1.071104e-121  1.507151e-72
247  1.000000e+00  1.726970e-85  1.237235e-52
248  1.000000e+00 1.570425e-121  1.850985e-60
249  1.000000e+00  1.501573e-99  3.577315e-70
250  1.000000e+00 4.034356e-107  2.046856e-39
251  1.000000e+00 6.894741e-118  3.942318e-46
252  1.000000e+00 3.637413e-114  1.380370e-66
253  1.000000e+00 9.974536e-115  5.983107e-40
254  1.000000e+00 4.214496e-161  7.208930e-58
255  1.000000e+00 1.839915e-101  2.583646e-51
256  1.000000e+00 2.884899e-128  2.469645e-61
258  1.000000e+00  1.986041e-94  1.689584e-85
259  1.000000e+00  2.984444e-56  4.996325e-62
260  1.000000e+00 1.247952e-155  3.919568e-62
261  1.000000e+00  1.782553e-76  5.280700e-53
262  1.000000e+00 3.314837e-122  2.323582e-79
263  1.000000e+00 1.677925e-135  1.493023e-39
264  1.000000e+00 1.198820e-137  3.330126e-69
265  1.000000e+00  8.029180e-62  6.684889e-58
266  1.000000e+00 4.356844e-129  1.647240e-62
267  1.000000e+00  1.150705e-90  5.117224e-37
268  1.000000e+00 2.544552e-178  1.158961e-53
270  1.000000e+00 7.893047e-128  6.341904e-80
271  1.000000e+00 5.236358e-127  9.840445e-39
273  1.000000e+00 2.258098e-111  1.118938e-42
274  1.000000e+00 7.780342e-140  1.175677e-69
275  1.000000e+00 7.922007e-104  3.689074e-56
276  1.000000e+00 4.308239e-118  1.367370e-77
277  9.383743e-73  4.229444e-53  1.000000e+00
278  2.414257e-46  6.692323e-33  1.000000e+00
279  4.687592e-52  1.229642e-45  1.000000e+00
280  1.048175e-60  3.444186e-21  1.000000e+00
281  5.469722e-54  1.129503e-52  1.000000e+00
282  2.241766e-70  1.074536e-56  1.000000e+00
283  4.593230e-61  5.951872e-27  1.000000e+00
284  2.288727e-61  5.338912e-73  1.000000e+00
285  1.432389e-60  1.726689e-45  1.000000e+00
286  2.039038e-50  3.258510e-34  1.000000e+00
287  9.109742e-72  1.098026e-65  1.000000e+00
288  6.685114e-45  7.275545e-30  1.000000e+00
289  3.123457e-71  3.729335e-74  1.000000e+00
290  2.497980e-64  4.169730e-94  1.000000e+00
291  1.295953e-74  4.031413e-65  1.000000e+00
292  1.838294e-49  1.795872e-43  1.000000e+00
293  7.633018e-50  2.033881e-08  1.000000e+00
294  7.713006e-95 5.573962e-187  1.000000e+00
295  2.164057e-66  8.161922e-34  1.000000e+00
296  4.115408e-48  1.364831e-66  1.000000e+00
297  1.978714e-56  2.528745e-16  1.000000e+00
298  2.777610e-57  4.234493e-43  1.000000e+00
299  1.206631e-74  3.632233e-24  1.000000e+00
300  2.515493e-47  1.752889e-37  1.000000e+00
301  1.966260e-77  2.935052e-51  1.000000e+00
302  6.646101e-54  9.510346e-75  1.000000e+00
303  3.208245e-87  2.559884e-89  1.000000e+00
304  1.575148e-52  1.308607e-37  1.000000e+00
305  1.340750e-70  2.068844e-59  1.000000e+00
306  2.051777e-51  1.462469e-77  1.000000e+00
307  1.418427e-65  1.883947e-06  9.999981e-01
308  9.302754e-49  2.214646e-66  1.000000e+00
309  6.913968e-68  6.639889e-27  1.000000e+00
310  1.217067e-57  1.420780e-69  1.000000e+00
311  6.092297e-54  2.616066e-40  1.000000e+00
312  4.367672e-85 1.831430e-104  1.000000e+00
313  5.181682e-72  1.506746e-68  1.000000e+00
314  9.562039e-46  9.723739e-63  1.000000e+00
315  1.754656e-90  1.469712e-63  1.000000e+00
316  1.634584e-54  2.249546e-86  1.000000e+00
317  7.277925e-54  1.567397e-30  1.000000e+00
318  1.370821e-69  3.584048e-60  1.000000e+00
319  6.936680e-55  4.964565e-42  1.000000e+00
320  1.631335e-79  7.066499e-64  1.000000e+00
321  2.551698e-86 8.374298e-111  1.000000e+00
322  1.666233e-54  5.578999e-83  1.000000e+00
323  4.887942e-82  2.089817e-90  1.000000e+00
324  1.767910e-51  1.671757e-39  1.000000e+00
325  3.012468e-55  3.049238e-44  1.000000e+00
326  4.008165e-89 1.181851e-112  1.000000e+00
327  7.332885e-95 1.178048e-103  1.000000e+00
328  3.782168e-57  2.803699e-64  1.000000e+00
329  1.058241e-74  9.870156e-61  1.000000e+00
330  7.903304e-51  1.246369e-44  1.000000e+00
331  1.368628e-63  1.565206e-13  1.000000e+00
332  2.730994e-62  2.762055e-61  1.000000e+00
333  5.220568e-80  2.129608e-59  1.000000e+00
334  1.514945e-45  4.068005e-24  1.000000e+00
335  2.029049e-57  3.448219e-51  1.000000e+00
336  7.675421e-61  3.661378e-11  1.000000e+00
337  8.942477e-59  1.327385e-61  1.000000e+00
338  6.212078e-80  1.959542e-90  1.000000e+00
339  6.325469e-78  5.897002e-70  1.000000e+00
340  1.160711e-67 1.231273e-101  1.000000e+00
341  1.194972e-71  8.123202e-17  1.000000e+00
342  2.133439e-53  4.893635e-52  1.000000e+00
343  5.050237e-61  1.191538e-66  1.000000e+00
344  2.773257e-79  6.303744e-89  1.000000e+00

Exercise

Use the Publication data from ISLR2.

Split data into 80%-20% training and test set randomly.

Generate a multinomial logistic model to classify variable mech.

use the test data to predict mech variable. See if it is a reasonable fit.

Test Error Rate

What was the test error rate you get for the previous exercise?

\[Ave(I(y_0 \neq \hat y_0))\]

The Bayes Classifier

“a classifier that assigns each observation to the most likely class, given its predictor values” minimizes the test error rate.

  • This lowest error rate is called Bayes Error Rate

  • Bayes Decision Boundary

  • Why not always use Bayes Classifier?

Moving Forward

Keep in mind the good old Bayes Rule

\[P(A|B) = \frac{P(B|A)* P(A)}{P(B)}\]

Generative Models - What ?

  • We saw that logistic model estimates \(Pr(Y=k|X=x)\).
  • Alternatively, we model distribution of each predictor for a given class of Y.
  • Then we use the Bayes rule to get \(Pr(Y=k|X=x)\)
  • “When the distribution of X within each class is assumed to be normal, it turns out that the model is very similar in form to logistic regression”

Generative Models - Why?

  • For logistic regression, unstable parameter estimates when separation between two classes is substantial.
  • When distribution of X for each class of Y is normal and the sample size is small, these methods do better than logistic regression.

Generative Models - How?

\(\pi_k\) is the overall probability of seeing \(k^{th}\) class of response in data.

\(f_k(X) = Pr(X|Y=k)\)

\[Pr(Y=k|X=x) = \frac{\pi_k*f_k(x)}{\sum_{l=1}^K\pi_lf_l(x)}\]

We are trying to approximate the Bayes classifier!! We will esplore linear discriminant analysis, quadratic discriminant analysis and naive Bayes

LDA for One predictor

Over arching goal is to figure out the \(f_k(x)\)

To achieve our goal, we assume that \(f_k(x)\) is normal.

\[f_k(x) = \frac{1}{\sigma_k\sqrt{2\pi}}exp(-\frac{1}{2\sigma_k^2}(x-\mu_k)^2)\]

Here, \(\mu_k\) and \(\sigma_k^2\) is the mean and variance parameter of the \(k^th\) class.

we also assume, that \(\sigma_1^2 = ...\sigma_K^2\)

LDA for One Predictor

\[ Pr(Y=k|X=x) = \frac{\pi_k*\frac{1}{\sigma\sqrt{2\pi}}exp(-\frac{1}{2\sigma^2}(x-\mu_k)^2)}{\sum_{l=1}^K\pi_l\frac{1}{\sigma\sqrt{2\pi}}exp(-\frac{1}{2\sigma^2}(x-\mu_k)^2)} \]

\[ log(Pr(Y=k|X=x)) = x.\frac{\mu_k}{\sigma^2}-\frac{\mu_k^2}{2\sigma^2} + log(\pi_k) \]

\[ x = \frac{\mu_1^2-\mu_2^2}{2(\mu_1-\mu_2)}= \frac{\mu_1 + \mu_2}{2} \]

Applying LDA

lda_default_balance_student <-
  MASS::lda(default ~ balance + student, data = Default)
lda_default_balance_student
Call:
lda(default ~ balance + student, data = Default)

Prior probabilities of groups:
    No    Yes 
0.9667 0.0333 

Group means:
      balance studentYes
No   803.9438  0.2914037
Yes 1747.8217  0.3813814

Coefficients of linear discriminants:
                    LD1
balance     0.002244397
studentYes -0.249059498

Applying LDA - training error rate

mean(
  predict(lda_default_balance_student,
          newdata = Default)$class != Default$default
)
[1] 0.0275
  • training error rate

  • trivial null classifier

predict(lda_default_balance_student,
          newdata = Default)|>names()
[1] "class"     "posterior" "x"        

Exercise

  • See the OJ data set in ISLR2

  • Use this data set to predict variable purchase

  • Split data into 80/20 training and testing.

  • Use training data to develop a LDA model. Use RoC and confusion matrix to gauge model effectiveness. Fine tune model. See chapter 9 TMWR.

  • predict test data with the fine tuned model.

QDA

Quadratic Discriminant Analysis

  • This too assumes that observations within each class are drawn from a Gaussian distribution.
  • However, the assumption of common covariance matrix is not held to be true in QDA. This is where it differs from LDA.
  • This leads to the \(x\) in discriminant function to appear as quadratic.
  • Now, \(Kp(p+1)/2\) parameters need to be estimated for covariance matrix instead of p(p+1)/2. This is where bias variance trade off comes to play.
  • This means LDA can have low variance and high bias, especially if the \(\sigma_1^2=....=\sigma_K^2\) assumption is badly off.

Exercise

See the Smarket data in ISLR2.

Split in 80/20 training and testing.

Train LDA and QDA models.

Test these models and compare results - use test error rate.

What happens if you take n number of training data sets and n number of testing data sets, run LDA and QDA on each pair and plot training error rate and testing error rate distributions?

Naive Bayes

  • From LDA and QDA we have seen that estimating \(\pi_1...\pi_K\) is easy.
  • Estimating \(f_1(x).....f_K(x)\) is difficult.
  • The estimates of LDA and QDA help us avoid estimating a K p-dimensional density functions.
  • The Naive Bayes Classifier makes only one assumption - Within the kth class, the p predictors are independent.

\[f_k(x) = f_{k1}(x_1)*f_{k2}(x_2)*...*f_{kp}(x_p)\]

Naive Bayes

\[pr(X) = \frac{\pi_k*f_{k1}(x_1)*f_{k2}(x_2)*...*f_{kp}(x_p)}{\sum_{l=1}^K \pi_l*f_{l1}(x_1)*f_{l2}(x_2)*...*f_{lp}(x_p)}\] > How is \(f_{kj} estimated?\)

Exercise

use naiveBayes function from e1071 package.

Use Smarket data and compared results with QDA.

Estimating Discrete Values

What method to use when the response is numeric but always takes the values of a non-negative integer?

Does Linear Regression work?

Data: Bikeshare

Rows: 8,645
Columns: 15
$ season     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ mnth       <fct> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan,…
$ day        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ hr         <fct> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ holiday    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ weekday    <dbl> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,…
$ workingday <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ weathersit <fct> clear, clear, clear, clear, clear, cloudy/misty, clear, cle…
$ temp       <dbl> 0.24, 0.22, 0.22, 0.24, 0.24, 0.24, 0.22, 0.20, 0.24, 0.32,…
$ atemp      <dbl> 0.2879, 0.2727, 0.2727, 0.2879, 0.2879, 0.2576, 0.2727, 0.2…
$ hum        <dbl> 0.81, 0.80, 0.80, 0.75, 0.75, 0.75, 0.80, 0.86, 0.75, 0.76,…
$ windspeed  <dbl> 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0896, 0.0000, 0.0…
$ casual     <dbl> 3, 8, 5, 3, 0, 0, 2, 1, 1, 8, 12, 26, 29, 47, 35, 40, 41, 1…
$ registered <dbl> 13, 32, 27, 10, 1, 1, 0, 2, 7, 6, 24, 30, 55, 47, 71, 70, 5…
$ bikers     <dbl> 16, 40, 32, 13, 1, 1, 2, 3, 8, 14, 36, 56, 84, 94, 106, 110…

Does Linear Regression work?


Call:
lm(formula = bikers ~ workingday + temp + weathersit + mnth + 
    hr, data = Bikeshare)

Residuals:
    Min      1Q  Median      3Q     Max 
-299.00  -45.70   -6.23   41.08  425.29 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                -68.632      5.307 -12.932  < 2e-16 ***
workingday                   1.270      1.784   0.711 0.476810    
temp                       157.209     10.261  15.321  < 2e-16 ***
weathersitcloudy/misty     -12.890      1.964  -6.562 5.60e-11 ***
weathersitlight rain/snow  -66.494      2.965 -22.425  < 2e-16 ***
weathersitheavy rain/snow -109.745     76.667  -1.431 0.152341    
mnthFeb                      6.845      4.287   1.597 0.110398    
mnthMarch                   16.551      4.301   3.848 0.000120 ***
mnthApril                   41.425      4.972   8.331  < 2e-16 ***
mnthMay                     72.557      5.641  12.862  < 2e-16 ***
mnthJune                    67.819      6.544  10.364  < 2e-16 ***
mnthJuly                    45.324      7.081   6.401 1.63e-10 ***
mnthAug                     53.243      6.640   8.019 1.21e-15 ***
mnthSept                    66.678      5.925  11.254  < 2e-16 ***
mnthOct                     75.834      4.950  15.319  < 2e-16 ***
mnthNov                     60.310      4.610  13.083  < 2e-16 ***
mnthDec                     46.458      4.271  10.878  < 2e-16 ***
hr1                        -14.579      5.699  -2.558 0.010536 *  
hr2                        -21.579      5.733  -3.764 0.000168 ***
hr3                        -31.141      5.778  -5.389 7.26e-08 ***
hr4                        -36.908      5.802  -6.361 2.11e-10 ***
hr5                        -24.135      5.737  -4.207 2.61e-05 ***
hr6                         20.600      5.704   3.612 0.000306 ***
hr7                        120.093      5.693  21.095  < 2e-16 ***
hr8                        223.662      5.690  39.310  < 2e-16 ***
hr9                        120.582      5.693  21.182  < 2e-16 ***
hr10                        83.801      5.705  14.689  < 2e-16 ***
hr11                       105.423      5.722  18.424  < 2e-16 ***
hr12                       137.284      5.740  23.916  < 2e-16 ***
hr13                       136.036      5.760  23.617  < 2e-16 ***
hr14                       126.636      5.776  21.923  < 2e-16 ***
hr15                       132.087      5.780  22.852  < 2e-16 ***
hr16                       178.521      5.772  30.927  < 2e-16 ***
hr17                       296.267      5.749  51.537  < 2e-16 ***
hr18                       269.441      5.736  46.976  < 2e-16 ***
hr19                       186.256      5.714  32.596  < 2e-16 ***
hr20                       125.549      5.704  22.012  < 2e-16 ***
hr21                        87.554      5.693  15.378  < 2e-16 ***
hr22                        59.123      5.689  10.392  < 2e-16 ***
hr23                        26.838      5.688   4.719 2.41e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 76.5 on 8605 degrees of freedom
Multiple R-squared:  0.6745,    Adjusted R-squared:  0.6731 
F-statistic: 457.3 on 39 and 8605 DF,  p-value: < 2.2e-16

Does Linear Regression work?

Does Linear Regression work?

What if we adjustfor non-constant variance of \(\epsilon\) with Y.


Call:
lm(formula = log(bikers) ~ workingday + temp + weathersit + mnth + 
    hr, data = Bikeshare)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.2919 -0.3038  0.0450  0.3807  2.5641 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                2.40308    0.04404  54.563  < 2e-16 ***
workingday                -0.02036    0.01481  -1.375 0.169169    
temp                       1.05865    0.08516  12.432  < 2e-16 ***
weathersitcloudy/misty    -0.05990    0.01630  -3.674 0.000240 ***
weathersitlight rain/snow -0.68523    0.02461 -27.845  < 2e-16 ***
weathersitheavy rain/snow -0.79376    0.63626  -1.248 0.212235    
mnthFeb                    0.23106    0.03558   6.494 8.84e-11 ***
mnthMarch                  0.30883    0.03570   8.652  < 2e-16 ***
mnthApril                  0.63591    0.04127  15.410  < 2e-16 ***
mnthMay                    0.91154    0.04682  19.470  < 2e-16 ***
mnthJune                   0.85752    0.05431  15.791  < 2e-16 ***
mnthJuly                   0.76458    0.05877  13.010  < 2e-16 ***
mnthAug                    0.77030    0.05510  13.979  < 2e-16 ***
mnthSept                   0.85967    0.04917  17.483  < 2e-16 ***
mnthOct                    0.91447    0.04108  22.259  < 2e-16 ***
mnthNov                    0.80497    0.03826  21.041  < 2e-16 ***
mnthDec                    0.63938    0.03544  18.040  < 2e-16 ***
hr1                       -0.61508    0.04729 -13.005  < 2e-16 ***
hr2                       -1.11341    0.04758 -23.402  < 2e-16 ***
hr3                       -1.68041    0.04795 -35.042  < 2e-16 ***
hr4                       -1.99993    0.04815 -41.532  < 2e-16 ***
hr5                       -1.05245    0.04761 -22.106  < 2e-16 ***
hr6                        0.18048    0.04734   3.813 0.000138 ***
hr7                        1.14734    0.04725  24.285  < 2e-16 ***
hr8                        1.81391    0.04722  38.415  < 2e-16 ***
hr9                        1.53239    0.04724  32.436  < 2e-16 ***
hr10                       1.22379    0.04735  25.847  < 2e-16 ***
hr11                       1.34852    0.04749  28.397  < 2e-16 ***
hr12                       1.53880    0.04764  32.302  < 2e-16 ***
hr13                       1.53233    0.04780  32.055  < 2e-16 ***
hr14                       1.46830    0.04794  30.629  < 2e-16 ***
hr15                       1.50923    0.04797  31.463  < 2e-16 ***
hr16                       1.76166    0.04790  36.775  < 2e-16 ***
hr17                       2.17604    0.04771  45.612  < 2e-16 ***
hr18                       2.08322    0.04760  43.765  < 2e-16 ***
hr19                       1.79162    0.04742  37.781  < 2e-16 ***
hr20                       1.49547    0.04734  31.593  < 2e-16 ***
hr21                       1.23022    0.04725  26.036  < 2e-16 ***
hr22                       0.99079    0.04722  20.984  < 2e-16 ***
hr23                       0.58547    0.04720  12.403  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6349 on 8605 degrees of freedom
Multiple R-squared:  0.8092,    Adjusted R-squared:  0.8084 
F-statistic:   936 on 39 and 8605 DF,  p-value: < 2.2e-16

Does Linear Regression work?

What if we adjustfor non-constant variance of \(\epsilon\) with Y.

Poisson Distribution

\[Pr(Y=k) = \frac{e^{-\lambda}\lambda^k}{k!}\]

\(Y \in {0,1,2,3,4,...}\)

\(k = 0,1,2,3,4,...\)

\(\lambda > 0\) is the expected value of \(Y\).

\(\lambda = E(Y) = Var(Y)\)

Poisson Regression

\(\lambda(X_1,..X_p)\) The expected mean is a function of p covariates.

\[log(\lambda(X_1,...,X_p)) = \beta_0 + \beta_1X_1+...+\beta_pX_p\] or

\[\lambda(X_1,...,X_p) = e^{\beta_0 + \beta_1X_1+...+\beta_pX_p})\]

Poisson Regression

\[l(\beta_0,\beta_1,...\beta_p) = \prod_{i=1}^n\frac{e^{-\lambda(x_i)}\lambda(x_i)^{y_i}}{y_i!}\]

Poisson Regressoion

glm(
  bikers ~ workingday + temp + weathersit + mnth + hr,
    data = Bikeshare,
  family = poisson
)-> bikers_poi

summary(bikers_poi)

Call:
glm(formula = bikers ~ workingday + temp + weathersit + mnth + 
    hr, family = poisson, data = Bikeshare)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-20.7574   -3.3441   -0.6549    2.6999   21.9628  

Coefficients:
                           Estimate Std. Error  z value Pr(>|z|)    
(Intercept)                2.693688   0.009720  277.124  < 2e-16 ***
workingday                 0.014665   0.001955    7.502 6.27e-14 ***
temp                       0.785292   0.011475   68.434  < 2e-16 ***
weathersitcloudy/misty    -0.075231   0.002179  -34.528  < 2e-16 ***
weathersitlight rain/snow -0.575800   0.004058 -141.905  < 2e-16 ***
weathersitheavy rain/snow -0.926287   0.166782   -5.554 2.79e-08 ***
mnthFeb                    0.226046   0.006951   32.521  < 2e-16 ***
mnthMarch                  0.376437   0.006691   56.263  < 2e-16 ***
mnthApril                  0.691693   0.006987   98.996  < 2e-16 ***
mnthMay                    0.910641   0.007436  122.469  < 2e-16 ***
mnthJune                   0.893405   0.008242  108.402  < 2e-16 ***
mnthJuly                   0.773787   0.008806   87.874  < 2e-16 ***
mnthAug                    0.821341   0.008332   98.573  < 2e-16 ***
mnthSept                   0.903663   0.007621  118.578  < 2e-16 ***
mnthOct                    0.937743   0.006744  139.054  < 2e-16 ***
mnthNov                    0.820433   0.006494  126.334  < 2e-16 ***
mnthDec                    0.686850   0.006317  108.724  < 2e-16 ***
hr1                       -0.471593   0.012999  -36.278  < 2e-16 ***
hr2                       -0.808761   0.014646  -55.220  < 2e-16 ***
hr3                       -1.443918   0.018843  -76.631  < 2e-16 ***
hr4                       -2.076098   0.024796  -83.728  < 2e-16 ***
hr5                       -1.060271   0.016075  -65.957  < 2e-16 ***
hr6                        0.324498   0.010610   30.585  < 2e-16 ***
hr7                        1.329567   0.009056  146.822  < 2e-16 ***
hr8                        1.831313   0.008653  211.630  < 2e-16 ***
hr9                        1.336155   0.009016  148.191  < 2e-16 ***
hr10                       1.091238   0.009261  117.831  < 2e-16 ***
hr11                       1.248507   0.009093  137.304  < 2e-16 ***
hr12                       1.434028   0.008936  160.486  < 2e-16 ***
hr13                       1.427951   0.008951  159.529  < 2e-16 ***
hr14                       1.379296   0.008999  153.266  < 2e-16 ***
hr15                       1.408149   0.008977  156.862  < 2e-16 ***
hr16                       1.628688   0.008805  184.979  < 2e-16 ***
hr17                       2.049021   0.008565  239.221  < 2e-16 ***
hr18                       1.966668   0.008586  229.065  < 2e-16 ***
hr19                       1.668409   0.008743  190.830  < 2e-16 ***
hr20                       1.370588   0.008973  152.737  < 2e-16 ***
hr21                       1.118568   0.009215  121.383  < 2e-16 ***
hr22                       0.871879   0.009536   91.429  < 2e-16 ***
hr23                       0.481387   0.010207   47.164  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1052921  on 8644  degrees of freedom
Residual deviance:  228041  on 8605  degrees of freedom
AIC: 281159

Number of Fisher Scoring iterations: 5

Poisson Regression

DescTools::PseudoR2(bikers_poi,
                    which = 
            c("McFadden","CoxSnell",
              "Nagelkerke"))
  McFadden   CoxSnell Nagelkerke 
 0.7458506  1.0000000  1.0000000 

Poisson Regression